GENEVAL: A Proposal for Shared-task Evaluation in NLG
نویسندگان
چکیده
We propose to organise a series of sharedtask NLG events, where participants are asked to build systems with similar input/output functionalities, and these systems are evaluated with a range of different evaluation techniques. The main purpose of these events is to allow us to compare different evaluation techniques, by correlating the results of different evaluations on the systems entered in the events.
منابع مشابه
Pragmatic Influences on Sentence Planning and Surface Realization: Implications for Evaluation
Three questions to ask of a proposal for a shared evaluation task are: whether to evaluate, what to evaluate and how to evaluate. For NLG, shared evaluation resources could be a very positive development. In this statement I address two issues related to the what and how of evaluation: establishing a “big picture” evaluation framework, and evaluating generation in context.
متن کاملA Repository of Data and Evaluation Resources for Natural Language Generation
Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task definitions and evaluation regimes. In other contexts too, sharable NLG data is now being created. In this paper, we describe the onlin...
متن کاملAutomatic Evaluation of Referring Expression Generation Is Possible
Shared evaluation metrics and tasks are now well established in many fields of Natural Language Processing. However, the Natural Language Generation (NLG) community is still lacking common methods for assessing and comparing the quality of systems. A number of issues that complicate automatic evaluation of NLG systems have been discussed in the literature. 1 The most fundamental observation in ...
متن کاملIntroducing Shared Task Evaluation to NLG The TUNA Shared Task Evaluation Challenges
Shared Task Evaluation Challenges (stecs) have only recently begun in the field of nlg. The tuna stecs, which focused on Referring Expression Generation (reg), have been part of this development since its inception. This chapter looks back on the experience of organising the three tuna Challenges, which came to an end in 2009. While we discuss the role of the stecs in yielding a substantial bod...
متن کاملEvaluation in Natural Language Generation: The Question Generation Task
Question Generation (QG) is proposed as a shared-task evaluation campaign for evaluating Natural Language Generation (NLG) research. QG is a subclass of NLG that plays an important role in learning environments, information seeking, and other applications. We describe a possible evaluation framework for standardized evaluation of QG that can be used for black-box evaluation, for finer-grained e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006